Skip to main content

Incident Report: Persisting Stuck Collector Alerts on Mantle and Blast

Date: 2024-03-15
Time: 22:10 (UTC+5:30)
Duration: 4 days 18 hours

Description

A stuck collector was detected on Mantle and Blast, indicating a significant lag in data collection.

Root Cause

These chains are seemingly generating too many blocks/not something the system was designed the handle. It's a problem of resource constraint.

Impact

The data collection for Mantle and Blast was delayed, causing a lag in the current block and the last queried block.

Timeline

  • 22:10 (Mar 15) - First noticed the issue.
  • 04:36 (Mar 16) - Initial diagnosis.
  • 04:39 (Mar 16) - Started the fix.
  • 18:29 (Mar 20) - Issue resolved.

Lessons Learned

The incident highlighted the need for flexible data collection methods and the importance of having fallback mechanisms in place for RPC issues.

Actions Taken

  • Initially, Aaron doubled the resources to face the resource constraint hoping to catch up again over in few hours.
  • Later, gas price strategies were updated for production and Vekil set the collectors for Mantle, Base, and Blast to be up to date.
  • Slack escalation link.

Incident Reviewer(s)

  • Arda
  • Aaron
  • Andrew
  • Abdel
  • Vekil